arXiv: 1505.04228vl [q-bio.PE] 16 May 2015 


FUNDAMENTAL LIMITS ON THE ACCURACY OF 
DEMOGRAPHIC INFERENCE BASED ON 
THE SAMPLE FREQUENCY SPECTRUM 


By Jonathan Terhorst* and Yun S. Song 1 ' 

University of California, Berkeley 

The sample frequency spectrum (SFS) of DNA sequences from a 
collection of individuals is a summary statistic which is commonly 
used for parametric inference in population genetics. Despite the 
popularity of SFS-based inference methods, currently little is known 
about the information-theoretic limit on the estimation accuracy as 
a function of sample size. Here, we show that using the SFS to esti¬ 
mate the size history of a population has a minimax error of at least 
0(1/logs), where s is the number of independent segregating sites 
used in the analysis. This rate is exponentially worse than known 
convergence rates for many classical estimation problems in statis¬ 
tics. Another surprising aspect of our theoretical bound is that it 
does not depend on the dimension of the SFS, which is related to the 
number of sampled individuals. This means that, for a fixed number s 
of segregating sites considered, using more individuals does not help 
to reduce the minimax error bound. Our result pertains to popula¬ 
tions that have experienced a bottleneck, and we argue that it can 
be expected to apply to many populations in nature. 


1. Introduction. The past decade has seen a revolution in our ability 
to interrogate the genome at the molecular level. Fueled by technological 
advances in DNA sequencing, studies now routinely query thousands or tens 
of thousands of individuals [1,19,28,31,32] in order to better understand 
disease susceptibility, heritability, population history, and other phenomena. 
In most cases, the conclusions of these studies come in the form of statistical 
estimates obtained from models that relate the effect of interest to muta¬ 
tion patterns arising in sampled DNA sequences. As genetic sample sizes 
explode, it is natural to wonder how additional data improve the quality 
of these estimates. While this general question has received intense focus 
in theoretical statistics, certain aspects of the genetics setting (for example, 
non-Gaussianity and lack of independence among samples) complicate efforts 
to study such models using classical techniques. New methods are needed to 
theoretically characterize some common models in statistical genetics. 
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Here, we address this need for a specific estimation problem in population 
genetics known as demographic inference. As we explain in further detail 
below, the aim of this problem is to reconstruct the sequence of historical 
events—including population size changes, migration, and admixture—that 
gave rise to present-day populations, using DNA samples obtained from those 
populations. We focus on the simplest problem of estimating the size history 
of a single population backwards in time. 

A summary statistic known as the sample frequency spectrum (SFS; de¬ 
fined below) is often employed in empirical studies [3, 5-8,10, 19, 20], but 
there have been fewer attempts to understand SFS-based estimation from a 
theoretical perspective. The main result of this paper is to show that, for a 
common class of estimators which analyze the SFS, there is a fundamental 
limit on their accuracy as a function of the sample size. More precisely, we 
show that, under a standard statistical error metric known as minimax error, 
the rate at which these estimators converge to the truth for certain popula¬ 
tions is at best inversely logarithmic in the number of independent segregat¬ 
ing sites analyzed, and does not depend at all on the number of individuals 
sampled. Compared to other types of statistical estimation problems (for ex¬ 
ample, linear regression), this is an extremely slow rate of convergence. Our 
proof is information-theoretic in nature and applies to any estimator that 
operates solely on the SFS. This is the first result we are aware of that char¬ 
acterizes the convergence rate of demographic history estimates as a function 
of sample size. 

The remainder of this paper is organized as follows. In Section 2 we for¬ 
mally define our notation and model. In Section 3 we state our main the¬ 
oretical results, followed by a discussion of their practical implications in 
Section 4. To streamline our exposition, all mathematical proofs are deferred 
until Section 5. 

2. Preliminaries. The stochastic process underlying the inference pro¬ 
cedure we consider is Kingman’s coalescent [13-15], which evolves backward 
in time and describes the genealogy of a collection of chromosomes randomly 
sampled from a population. The population size is assumed to change de¬ 
terministically over time and is described by a function r/ : [0, oo) —> (0, oo), 
with i](t) being the population size at time t in the past. The instantaneous 
rate of coalescence between any pair of lineages at time t is 1 /rj(t). 

As in the standard infinite-sites model of mutation [12], we assume that 
every dimorphic site (i.e., a site with exactly two observed allelic types) has 
experienced mutation exactly once in the evolutionary history relating the 
sample. Further, for each such site, we assume that it is known which allele 
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is the ancestral type versus the mutant type. In what follows, we use the 
terms dimorphic and segregating interchangeably. 

A population size function 77 (f) induces a probability distribution on the 

number of derived alleles found at a particular segregating site. Specifically, 

(v) 

for a sample of n > 2 randomly sampled individuals, let Q l. for 1 < b < n—1. 
denote the probability that a segregating site contains b mutant alleles in a 
sample of n individuals under model 77 . The vector == (^j ■ ■ ■ ■ ,Cnn_ 1 ) 
is called the expected SFS. In the coalescent setting, a general expression for 
is given by [9] 
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where EX^ denotes the amount of time (in coalescent units) during which 
the genealogy of the sample contained k lineages under model 77 . The ex- 

( 77 ) 

pected waiting time E Tm,m to the first coalescence in a sample of m individ¬ 
uals is given by 

(1) c£> = E T$ m = J t ^ exp {-a m R v (t)} dt, 


where a m = ('") and R v (t ) = f 0 ^jds is the cumulative rate of coalescence 
up to time t. It turns out [22] that there is an invertible linear transformation 
which relates (ET^j,ET^,... ,EX^) to = (c^\ c^\ ..., c$). Using 
this relation, the quantity can be written as [23] 
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where W n , 6 = {W n j,,2, • • •, W n ,b,n) and V n = {V n , 2 , ■ ■ ■, U n , n ) are vectors of 
universal constants that do not depend on the population size function 77 , and 
(-, •) denotes the pinner product. Under model 77 , the quantity (c^, W n &) is 
the total expected length of edges subtending b out of n individuals sampled 
at time 0, while the quantity (c^, V n ) is the total expected tree length for a 
sample of size n. Both quantities are positive for all population size functions 
77 . For an arbitrary population size function 77 , we have ^ 6=1 IF n,b,m = Vn,m 
for all 2 < m < n, which implies 


n —1 

^(cW,W B , t ) = (cW,V n ). 
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For a constant function 77 (f) = N, 


( 4 ) 

( 5 ) 


(c^,w n , b ) = In, 

{cW,V n ) = 2NH n -i, 


where H n _ 1 = Ylb=i b 

To formulate the problem, we employ the following notation. We suppose 
that a sample of n > 2 randomly sampled individuals has been typed at 
s independent segregating sites. These data are used to form the empirical 
sample frequency spectrum, which is an (n— l)-tuple (£ n i,..., £ nn _i), where 
denotes the proportion of segregating sites with b copies of the mutant 
allele and n—b copies of the ancestral allele. A frequency-based, estimator is 
any statistic 77 which maps an empirical SFS to a population size history. 

3. Main Results. Here, we establish a minimax lower bound on the 
ability of any estimator fj to accurately reconstruct population size functions. 


3.1. A general bound on the KL divergence between two SFS distributions. 
Abusing notation, we use D(j] || 7 /) to denote the Kullback-Leibler (KL) di- 

(» 7 ) (V) 

vergence between the probability distributions £f n and £„ . In Section 5, 
we prove the following general upper bound on the KL divergence between 
two SFS distributions: 

Theorem 1. Let Ai denote a general space of population size func¬ 
tions and suppose rj,rf £ A i satisfy 77 (f) = rj'(t) for all 0 < t < t c and 
max i>tc 77 (f) < min i>tc 77 '(f). Then, 

(cW) - c W,V n ) 

(6) D(i 7 II 7 /) < -4- r -^-, 

UUU- ( c (77) )Vn ) 


3.2. Bounds for a family of piecewise-constant models. We now focus on 
a particular class of population size functions which are easier to analyze 
and popular in the literature [2,3,16]. For a fixed positive integer I\ > 1, let 
Mr C Ai denote the space of piecewise-constant size functions with exactly 
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K pieces. A population size function 77 is a member of Mx if and only if 
there exist positive real numbers t\ < ■ ■ • < tx- 1 and N\, N2 ,..., Nx such 
that 

K 

( 7 ) y(t) = Y N k l{t k ~i < t < t k }, 

k =1 


where by convention we define to = 0 and tx 
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defined in (1) is given by 



Note that since tx = 00, 

( 10 ) e ~ a mS K — q , for all rj G Mx- 


To formulate our result, we let I, J denote positive integers that satisfy 
I + J = K, and introduce a subfamily C Mx of piecewise-constant 
functions defined as follows. See Figure 1 for illustration. We assume that 
all change points t± < ■ ■ ■ < are fixed and that the sizes Ni,... , Nj 

of the first I epochs are also fixed, with Nj being the smallest size. So, all 
functions in J~r,J are identical to each other for the first I epochs, and there 
is a population bottleneck in the last epoch. Then, for t>tj, every function 
77 G undergoes jumps according to the following rules: 

1. For the interval tj < t < t/+i, 77 (f) takes a constant value of either h 
or h + 8, where h > Nj and <5 > 0. 

2. At later change points {t/+i, • • ■ ,tl+J- 1 }, r l either stays the same or 
jumps upward by 8. 

Hence, Fj y j consists of 2 J distinct piecewise-constant functions that are non¬ 
decreasing functions of t for t > tj. Note that mint y(t) = Nj for all 77 £ 

def 

For ease of notation, we use e = Nj to denote the bottleneck size and 
tb = tj — tj-i to denote the bottleneck duration. To facilitate analysis later, 
we fix tj + j — f/+j-i to some positive constant ta for all j = 1,..., J — 1. 

For any two models in Fj^j, we obtain the following bound on the differ¬ 
ence of their waiting times to the first coalescence: 
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Fig 1. A family J-i,j of piecewise-constant population size models with K = I + J epochs. 


Lemma 2. For all 77 , rf G Tij, 

( 11 ) \ c ( r n)_ c (v')\< jJL e -a m r B /e_ 

@"m 

Then, together with Theorem 1, this lemma can be used to show 

Theorem 3. Let 77 , 7 / G Fj t j that satisfy maxt>j / r](t) < mint>tj rfft). 
Then, 

(12) D(rj || ?/) < J-e~ TB/£ . 

Our proofs of the above results are deferred to Section 5. It is interesting that 
the above bound does not depend on the number n of sampled individuals. 


3.3. Minimax lower bounds. Before using the above results to obtain a 
minimax lower bound, we first note a subtle fact. Given any population size 
function 77 , consider a function £ which satisfies £(t) = k ■ r](t/n ) for all 
t G [0, 00 ), where k is some positive constant. Such functions are equivalent, 
as it turns out that for all n > 2 and l< 6 <n— 1. To mod 

out by this equivalence, we assume that every 77 G M satisfies 77 ( 0 ) = N& x , 
where A% x is some fixed positive constant. 

Let H'll^ denote a generic norm (specific examples will be given later) and 
let IE, ? (-) denote expectation with respect to the SFS distribution ^ri' 1 = 
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(£n'i, ■ ■ ■ ,Cnh_i) induced by population size function rj. Then, note that 

inf sup E v \\f) — 7/H* > inf sup E v \\fj - r/W* > inf sup E v \\fj — 77 11* . 
v ‘n ti&Mk v v£Fi,j 

In what follows, we will put a lower bound on the last quantity. We first 
fix a sensible distance metric on AT An intuitive way to measure distance 
between two population size functions is their L\ distance, ||? 7 a — r]b\\i = 
/ 0 °° \r] a (t)—r]b(t)\dt, but this is unreasonably stringent in that ||ry 0 — r]b\\i = oo 
if rj a and % do not agree infinitely far back into the past. Instead we will focus 
on the following truncated L\ distance: \\rj a — f/b||i,T = /q T \Va(t) — rjb(t)\dt, 
which measures the discrepancy between r/ a and % back to some fixed time 
T in the past. 

Henceforth, let fj be any estimator of the population size function which 
operates on a sample of s independent segregating sites obtained from a sam¬ 
ple of n randomly sampled individuals. In Section 5, we prove the following 
main results of our paper: 

Theorem 4. Consider the subfamily J~r,j of models described above, and 
suppose J > 8 and T > + ta■ Then, 

(13) inf sup E v ||fj -rj\\ lT > Ct a ^ J -e Tfl/£ , 

V ' J s 

where C is a positive constant. 

The above theorem applies to all models in J~pj. We now consider the 
subset J-f[j = {rj G J~pj : \\rj\\oo < AT}, which is the set of all models in J 7 /.j 
that are bounded by some constant M. For this family of bounded population 
size functions, a sharper asymptotic lower bound can be obtained as follows. 

Theorem 5. Suppose J > 8 and T > + t a . Then, 

(14) inf sup E r] \\7)-r]\\ 1T >C ^ J ^ 

v ’ J logs 

where C is a positive constant. 

By specializing J- f [j, a simplified version of Theorem 5 can be obtained: 
Corollary 6. Suppose T > ti + j-\+T A and let J-f f k = U,/>i Then, 



where C" is a positive constant. 
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Note that the above lower bounds do not depend on the dimension of the 
SFS (which is equal to n — 1). Hence, for a fixed number s of segregating 
sites considered, using more individuals does not help to reduce the error 
bounds. 

4. Discussion. In this paper, we have theoretically characterized fun¬ 
damental limits on the accuracy of demographic inference from data. The 
paper that most closely relates to the present work is by Kim et al. [11], who 
obtain lower bounds on the amount of exact coalescence time data necessary 
to distinguish between size histories in a hypothesis testing framework. Since 
coalescence times are never observed and must be estimated from data, these 
bounds place a limit on the accuracy with which a population size function 
can be inferred. The authors also describe an estimator which uses coales¬ 
cence times (again observed without noise) to accurately recover the under¬ 
lying population size function with high probability, at a rate that roughly 
matches the lower bound. The results presented here are complementary to 
those of Kim et al. in the following sense: access to coalescence times is 
equivalent to knowing the site frequency spectrum, so that their results ap¬ 
ply to SFS-based estimators as well as other types of procedures which also 
incorporate e.g. linkage information. By focusing specifically on SFS-based 
estimators, we are able to more precisely quantify the effects of sample size 
and underlying demography on the ability to accurately reconstruct a pop¬ 
ulation size function. Ultimately, we derive minimax lower bounds on the 
error of estimating a size history using the SFS. 

Another line of work centers around the identifiability of the parameter 
r][t) using the SFS. Roughly speaking, a family of statistical models {Pe}eeB 
defined over a parameter space 0 is identifiable if for any 9 1,62 E 0 with 
9\ 7 ^ 62 , the sampling distributions induced by Po t and Pq 2 are different. In 
our context this simply says that, for all n, £ n 1 ' ) 7 ^ unless 771 = 772 almost 
everywhere. Standard desiderata for statistical estimators (e.g., consistency 
or unbiasedness) are impossible without identifiability, so it is the weakest 
possible regularity condition one can impose on a useful family of models. 

Perhaps surprisingly, it turns out that in general a population size func¬ 
tion is not identifiable from the SFS [18]. Indeed, for any given rj(t), it has 
been shown that an infinite number of smooth functions F(t) exist such that 
= £n +F ^ ■ Moreover, explicit examples can be constructed which demon¬ 
strate this phenomenon [18]. On the other hand, these counter-examples 
consist of functions which exhibit an unbounded frequency of oscillatory be¬ 
havior near the present time, which is perhaps unrealistic when modeling 
naturally occurring populations. More recently, it has been shown [2] that 
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identifiability holds for many classes of population size functions employed by 
practitioners (including piecewise-constant, -exponential and -generalized- 
exponential). Furthermore, the number n of sampled individuals sufficient 
for identifiability can be explicitly given and is a function of the complexity 
of the underlying class of models being studied [2], 

Identifiability asserts that, given an infinite amount of data (specifically, 
taking the number of segregating sites s —> oo), the model parameter rj(t) can 
be uniquely recovered. In practice, s is finite and only a perturbed version 
of the expected frequency spectrum, say \ is observed. From a practical 
standpoint, it is important to understand how these perturbations ultimately 
affect the parameter estimate It is this question that forms the starting 
point for the present work. 

We have shown that the minimax error rate for estimating the piecewise- 
constant demography of a single population is at least 0(1/ log s), where s is 
the number of independent segregating sites analyzed. In contrast, the min¬ 
imax error for many classical estimation problems in statistics (for example, 
non-parametric regression or density estimation) decays inverse-polynomially 
in the sample size [29]. Compared with these problems, exponentially more 
samples would be required to estimate a population size history function to 
within a similar magnitude of error. 

A single population evolving under a piecewise-constant demography is 
a special case of many richer classes of demographic models. For example, 
it is a (limiting) member of the family of exponential growth models, seen 
by taking each exponential growth parameter to zero. In the multi-species 
coalescent setting [4,6], multiple population size histories must be estimated, 
and the error of that estimate must necessarily be lower bounded by that of 
estimating a single such history. Thus, our result can be expected to apply 
to a broader class of models than the one we have studied here. 

As detailed in Section 5, the result in Theorem 5 follows from setting 
e = tb / log s and 5 oc | exp(rs/e) in the subfamily J r /j. The size rs/logs 
is in coalescent units. In terms of the number of individuals, it is proportional 
to g^/logs, where gg is the number of generations corresponding to dura¬ 
tion tb in the coalescent limit. Intuitively, as the severity of the bottleneck 
increases, the population is increasingly likely to find its most recent common 
ancestor (MRCA) during that time; further back in time than the MRCA, 
no information is conveyed concerning the demographic events experienced 
by the population. 

One might object to considering models with a bottleneck size that scales 
inversely with the number s of segregating sites in the data, and it is indeed 
possible that a better convergence rate may be achievable for populations 
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which are known not to contain a bottleneck. On the other hand, we note that 
1 /logs decreases sufficiently slowly with s that our result can be expected 
to apply to many real-world examples. For example, for s ~ 10 s , which is 
a conservative upper bound for most organisms, gg /logs ~ 0.054 gs- This 
implies that for populations which have experienced roughly an order-of- 
magnitude increase in effective population size during their history, accurate 
estimation of demographic events which occurred before this expansion is 
difficult using SFS-based methods. Additionally, an interesting aspect of our 
work is that our minimax lower bounds do not depend on the number n of 
sampled individuals; increasing n is not enough to overcome the information 
barrier imposed by the presence of a bottleneck. This is intuitively plausible 
since, as n increases, the (n + l)th sampled lineage becomes more likely to 
coalesce early on. 

An interesting question which we have not attempted to analyze is whether 
the 0 (1/ logs) rate is optimal, i.e. whether there exists some estimator rj(t) 
which achieves the minimax lower bound established here. In practice, from 
equations (2), ( 8 ), and (9), it can be seen that naively maximizing the likeli¬ 
hood of the observed SFS with respect to 77 (f) requires solving a non-convex 
optimization problem, so that convergence to the global maximum is not even 
guaranteed. Computational issues aside, finding such an estimator remains 
an open theoretical challenge. 

In closing, we stress that our result is specific to SFS-based estimators, 
which analyze only independent sites. The main allure of these estimators 
is their mathematical tractability, rather than their realism. In fact, a rich 
source of additional information exists in the correlation structure found 
among linked sites in the genome. Methods which seek to exploit this struc¬ 
ture by modeling the action of recombination pose greater mathematical 
and computational difficulties, but there has been recent progress in this 
area [16,21,24-27]. Our result serves to underscore the importance of pursu¬ 
ing ever more realistic models of genomic evolution, challenging though they 
may be. 


5. Proofs. 

Proof of Theorem 1. To simplify the notation, we write c = c ^ and 
z! = c'^ '. Then, using (2) we can write 
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The assumption rniiq^ rj'(t) > ma ^-t>t c h(^) implies that, for all times t, t' > 
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t c , the instantaneous rate of coalescence at time t in model r/ is > the instan¬ 
taneous rate of coalescence at time t! in model r/'. Hence, this assumption 
together with rj(t ) = r]'(t) for all 0 < t < t c implies (c — c', W n 5 } < 0 for all 

1 < b < n— 1 ; equivalently, log ^ w”'’) ) < Additionally, 

and log(l + x) < x for all x > —1. Combining these facts, we obtain 


> -1 


71—1 


D(v\W) < J2^n,b l °S 


6=1 


(c',V n ) 


n—1 


< \ _ _ _ 

(c,V n )J~^ n ’ b <c,V n > (c,V„) 

where we have used X^ 6 =i = 1 hi the hnal equality. 


(j?) (c 7 - c, V„) _ (c 1 - c, V. n 
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PROOF of Lemma 2 . We distinguish two particular models rp, r/ u E J 7 /,,/ 
which are the lower and the upper envelopes of J~pj. The function rp stays 
constant at h for all t > tj, while r] u jumps upward by S at every change 
point tj, ..., Hence, rp < r] < rj u pointwise for all 7] E J-pj. The two 

enveloping functions will form the basis of subsequent analysis. 

Fix 77, r / E J~r,j and note that, by the definition of one of these 

functions must pointwise-dominate the other. So, assume without loss of 
generality that 77 (f) < rj'(t) for all t. Then for all t, 


< v(t) < v'(t) < Vu(t), 


which implies 


Jve) < Jv) < M) < r (Vu) 
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for all m = 2,..., n. Using these inequalities, we conclude 
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so it suffices to demonstrate (11) for cfn l Now, by equation (9) and 

the definition of 77 ^, 
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where we have used equation (10). Similarly, 
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Now, using the fact that % and agree on the hrst / epochs, we obtain 


M*-’ * *>] = 

1=1 
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(Vu) 
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where the second line follows from telescoping and the fact Sj+j = oo, while 
the last line follows from the fact that 1 -jr < for all j = 1, ..., J. [Z 

PROOF of Theorem 3. For ease of notation, define c = and c' = 
c^'\ By Lemma 2, 


(c C,V n ) — ^ ' (c m Crn)Vn,m Z T5 ^ ( 




n > m c ~a m T b/e 


m =2 


m=2 


< J5e~ TB / £ J2 


V n 


m =2 


where the second inequality follows from e~ arnTB ^ £ < e _Ts/,£ for all m = 
2 ,n. Now, noting that X^m =2 ee 221 corres P on ds to the total tree length 
for the constant population size function ?y = 1 and using (5), we obtain 

(c'-c,V n ) < J8e~ TB / £ 2H n - 1 . 


(16) 
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To finish the proof, recall that (c, V n ) is the total expected branch length 
of the coalescent tree under model r]. Since rnirp rj(t ) = e, we have that 
(c, V n ) is at least as large as the corresponding quantity under a model with 
constant population size e. By (5), the total expected tree length under the 
latter model equals 2eH n _ j. Thus, (c, V n ) > 2eH n ^\ and combining this 
result with (16) gives 


(c ; - c, V w ) 
<c,V n ) 


< J-e~ TB/£ . 
£ 


Finally, (12) follows from this inequality and Theorem 1. 


□ 


Proof of Theorem 4. Our proof uses a generalized form of Fano’s in¬ 
equality [30]. Adapted to our setting and notation, the method reads as 
follows. 


Theorem 7 (Fano’s method). Consider a space M of population size 
models. Let r >2 be an integer, and let Sf z = {771 , 772 ,... ,g r } C M. contain 
r population size functions such that for all a 7 ^ b, \\rj a — 77 *, ||* > a r and 
< Pr- Let fj( n,s ^ = ff n,s \X\,... ,X S ) be an estimator ofrj based 
on the SFS data X\ ,..., X s sampled independently from fn , *.e. ; X ±,..., X s 
are SFS data for n individuals at s independent segregating sites. Then, 


(17) 


inf sup Ejfj^ 
V rj&M 


> Or ( 1 _ s ■ Pr + log2 


logr 


This theorem places a lower bound on the minimax rate of convergence of 
a population size history estimator based on the SFS. 

For rj £ Fjj, let Wj denote the variable E {0,1} indicating whether rj 
jumps by 5 at change point tj +1 . Let y = {w = (wq, ..., wj-i) \ Wi E 
{0,1}}, where J > 8. By the Varshamov-Gilbert lemma (see [17, Lemma 
4.7]), there exist X = {w°,...,w M } C T such (i) w° = (0,..., 0), (ii) 
M > 2 J / 8 , and (iii) F[( w*,w J ) > Jj 8, where denotes the Hamming 

distance. 

Let J-fj denote the subset of 2 J / 8 + 1 functions in Fj.j with the indicator 
variable for (5-jumps at tj,... given by w E X. Then, for any two 

Va^Vb G Xf j, we have 


ha-Vb\\i,T > g - T A-S. 


( 18 ) 
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Using Theorem 7 via (18) and Theorem 3, we obtain 


inf sup Ejfj^ 
p n£Fi,j 


(19) 


V\\l,T > 


J -T A -5 
16 


sJ^e Ts / £ + log2 
log(2 J / 8 + 1) 


J -t a - 6 
16 


sJ^e Ts//£ + log2 
| log 2 


We now optimize the bound with respect to 5. A straightforward calculation 
shows that the maximum is attained at 


( 20 ) 


_ (J-8) log 2 /ex /£ 
16 J \s) 


and setting 6 = 6* in (19) yields the result. 


□ 


PROOF of Theorem 5. The result is obtained by scaling e with the 
number of segregating sites s. Denote this scaling by e(s); we will determine 
e(s) which produces the largest possible lower bound. Starting from Equa¬ 
tion (20) in the proof of Theorem 4, note that 5* scales as -e TS//£ =: /(e). 
In order to satisfy the constraint that ||?]||oo < Tf for all r/ £ Jq'j and s, the 
condition 

(21) hmsupmax j^^e Ts / £ l s ),e(s) j < oo 

s—/ oo ^ S ) 


must therefore hold. This implies that s(s)s p —> oo as s —> oo for all p > 0. 
Suppose that q = liminf^^oo £ ( £ ^ ogs < 1; note that e(s) > 0 implies q > 0. 
Then there exists a diverging sequence si, S 2 , • • • —> oo with log(s,) < LU? 
for all i. whence 


limSUp £ JA e r B /e{s) > limsup £M e T^ log ( s b 
s—/oo S i —/oo 


limsupe(sj)sl +9 = oo. 

/OO 


From this it follows that e(s) > T^/logs for sufficiently large s. Now, on 
the interval (0,oo), the function /(e) is convex with a unique minimum at 
e = tb- Let s' be a point where f(s') > /(rs/logs) = r^/logs. Then 
£ ' $ [ufs> r s]- If £> > T Bi then f(s') < ^e 1 . Since ^ < f(s'), we then 
conclude s' > fT B , which is not bounded as s —> oo. 

In summary, we see that the largest possible lower bound which obeys (21) 
must have /(e) asymptotically < r^/logs, and that this bound is achieved 
by setting e(s) = re/logs. Plugging this in to equation (17) yields the 
claim. □ 
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PROOF of COROLLARY 6 . For c 6 (0,1), choose J large enough so that 
(J — 8)/J > c, and fix ta so that T = ti + Jta- Then (J — 8 )ta > cJta = 
c(T — t/). Substituting the above inequalities into equation (14) and letting 
C" = C'c 2 yields the desired result. □ 
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